Minor changes to support benchmarking #138

humpydonkey · 2024-06-17T16:31:05Z

Minor changes to support benchmarking

Tweak the tester prompt to return the output in a testing function
Add a get_main_result() to Execution class
Update langsmith to the latest version

dillonalaird · 2024-06-17T17:51:01Z

vision_agent/agent/vision_agent_prompts.py

@@ -179,6 +179,8 @@ def find_text(image_path: str, text: str) -> str:
 8. DO NOT use try except block to handle the error, let the error be raised if the code is incorrect.
 9. DO NOT import the testing function as it will available in the testing environment.
 10. Print the output of the function that is being tested.
+11. Use the output of the function that is being tested as the return value of the testing function.
+12. Run the testing function in the end and don't assign a variable to its output.


Trying to think of ways to keep the list of instructions shorter as I suspect as the list gets longer the ability to follow the directions get's poorer.

Would this also work if we removed instruction 10. (as it seems to accomplish the same thing as 11.) and combine 11. and 12. into "Return the output of the function that is being tested in the test script, do not assign it to another variable" or something shorter?

I tried a few approaches. This actually is the best prompt so far.
Other approaches significantly increase the chance of the testing function return null.
Here are approaches i have tried:

Removing 10

Merge 11 + 12 into one instruction

Rewrite 11 in a more concise way

Move 11 + 12 to the 3th and 4th instruction

Move all instructions up
And some combinations of the above.

From my observation, this is not a problem, but shortening it can actually cause a big problem. i.e. the null values increases from 11% to 80% in my benchmark.
We probably want to rewrite/revisit this entire prompt at some point. It's a bit brittle now.

vision_agent/utils/execute.py

dillonalaird

LGTM

humpydonkey added 5 commits June 17, 2024 09:29

Changes to support benchmarking

a72084c

Fix lint

188c003

Empty-Commit

f4910fa

Replace pkg_resources with importlib

894884e

Empty-Commit

919fee7

dillonalaird requested changes Jun 17, 2024

View reviewed changes

dillonalaird approved these changes Jun 18, 2024

View reviewed changes

AsiaCao merged commit 6d2223b into main Jun 18, 2024
7 checks passed

AsiaCao deleted the support-benchmark branch June 18, 2024 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor changes to support benchmarking #138

Minor changes to support benchmarking #138

humpydonkey commented Jun 17, 2024

dillonalaird Jun 17, 2024

humpydonkey Jun 18, 2024 •

edited

Loading

dillonalaird left a comment

Minor changes to support benchmarking #138

Minor changes to support benchmarking #138

Conversation

humpydonkey commented Jun 17, 2024

dillonalaird Jun 17, 2024

Choose a reason for hiding this comment

humpydonkey Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

dillonalaird left a comment

Choose a reason for hiding this comment

humpydonkey Jun 18, 2024 •

edited

Loading